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Abstract 

We  present  DeblurGAN,  an  end-to-end  learned  method 
for  motion  deblurring.  The  learning  is  based  on  a  condi¬ 
tional  GAN  and  the  content  loss  .  DeblurGAN  achieves 
state-of-the  art  performance  both  in  the  structural  similarity 
measure  and  visual  appearance.  The  quality  of  the  deblur¬ 
ring  model  is  also  evaluated  in  a  novel  way  on  a  real-world 
problem  -  object  detection  on  (de-)blurred  images.  The 
method  is  5  times  faster  than  the  closest  competitor  -  Deep- 
Deblur  [ 25 ].  We  also  introduce  a  novel  method  for  gen¬ 
erating  synthetic  motion  blurred  images  from  sharp  ones, 
allowing  realistic  dataset  augmentation. 

The  model,  code  and  the  dataset  are  available  at 
https : //github . com/KupynOrest /DeblurGAN 

1.  Introduction 

This  work  is  on  blind  motion  deblurring  of  a  single  pho¬ 
tograph.  Significant  progress  has  been  recently  achieved 
in  related  areas  of  image  super-resolution  [20]  and  in¬ 
painting  [45]  by  applying  generative  adversarial  networks 
(GANs)  [1  ].  GANs  are  known  for  the  ability  to  preserve 
texture  details  in  images,  create  solutions  that  are  close  to 
the  real  image  manifold  and  look  perceptually  convincing. 
Inspired  by  recent  work  on  image  super-resolution  [2<  ]  and 
image-to-image  translation  by  generative  adversarial  net¬ 
works  [1  ],  we  treat  deblurring  as  a  special  case  of  such 
image-to-image  translation.  We  present  DeblurGAN  -  an 
approach  based  on  conditional  generative  adversarial  net¬ 
works  [24]  and  a  multi-component  loss  function.  Unlike 
previous  work  we  use  Wasserstein  GAN  [2]  with  the  gradi¬ 
ent  penalty  [11]  and  perceptual  loss  [17].  This  encourages 
solutions  which  are  perceptually  hard  to  distinguish  from 
real  sharp  images  and  allows  to  restore  finer  texture  details 
than  if  using  traditional  MSE  or  MAE  as  an  optimization 
target. 


Figure  1:  DeblurGAN  helps  object  detection.  YOLO  [3!  ] 
detections  on  the  blurred  image  (top),  the  DeblurGAN  re¬ 
stored  (middle)  and  the  sharp  ground  truth  image  from  the 
GoPro  [2  ]  dataset. 
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Figure  2:  GoPro  images  [25]  processed  by  DeblurGAN.  Blurred  -  left,  DeblurGAN  -  center,  ground  truth  sharp  -  right. 


We  make  three  contributions.  First,  we  propose  a  loss 
and  architecture  which  obtain  state-of-the  art  results  in  mo¬ 
tion  deblurring,  while  being  5x  faster  than  the  fastest  com¬ 
petitor.  Second,  we  present  a  method  based  on  random 
trajectories  for  generating  a  dataset  for  motion  deblurring 
training  in  an  automated  fashion  from  the  set  of  sharp  im¬ 
age.  We  show  that  combining  it  with  an  existing  dataset 
for  motion  deblurring  learning  improves  results  compared 
to  training  on  real-world  images  only.  Finally,  we  present  a 
novel  dataset  and  method  for  evaluation  of  deblurring  algo¬ 
rithms  based  on  how  they  improve  object  detection  results. 

2.  Related  work 
2.1.  Image  Deblurring 

The  common  formulation  of  non-uniform  blur  model  is 
the  following: 

IB  =  k(M )  *  Is  +  N,  (1) 

where  Ib  is  a  blurred  image,  k(M)  are  unknown  blur  ker¬ 
nels  determined  by  motion  field  M.  Is  is  the  sharp  latent 
image,  *  denotes  the  convolution,  N  is  an  additive  noise. 
The  family  of  deblurring  problems  is  divided  into  two  types: 
blind  and  non-blind  deblurring.  Early  work  [3  ]  mostly  fo¬ 
cused  on  non-blind  deblurring,  making  an  assumption  that 


the  blur  kernels  k(M)  are  known.  Most  rely  on  the  classi¬ 
cal  Lucy-Richardson  algorithm,  Wiener  or  Tikhonov  filter 
to  perform  the  deconvolution  operation  and  obtain  Is  esti¬ 
mate.  Commonly  the  blur  function  is  unknown,  and  blind 
deblurring  algorithms  estimate  both  latent  sharp  image  Is 
and  blur  kernels  k(M).  Finding  a  blur  function  for  each 
pixel  is  an  ill-posed  problem,  and  most  of  the  existing  algo¬ 
rithms  rely  on  heuristics,  image  statistics  and  assumptions 
on  the  sources  of  the  blur.  Those  family  of  methods  ad¬ 
dresses  the  blur  caused  by  camera  shake  by  considering  blur 
to  be  uniform  across  the  image.  Firstly,  the  camera  motion 
is  estimated  in  terms  of  the  induced  blur  kernel,  and  then 
the  effect  is  reversed  by  performing  a  deconvolution  oper¬ 
ation.  Starting  with  the  success  of  Fergus  et  al.  [8],  many 
methods  [44]  [42]  [28]  [  ]  has  been  developed  over  the  last 
ten  years.  Some  of  the  methods  are  based  on  an  iterative  ap¬ 
proach  [8]  [44],  which  improve  the  estimate  of  the  motion 
kernel  and  sharp  image  on  each  iteration  by  using  paramet¬ 
ric  prior  models.  However,  the  running  time,  as  well  as  the 
stopping  criterion,  is  a  significant  problem  for  those  kinds 
of  algorithms.  Others  use  assumptions  of  a  local  linearity 
of  a  blur  function  and  simple  heuristics  to  quickly  estimate 
the  unknown  kernel.  These  methods  are  fast  but  work  well 
on  a  small  subset  of  images. 

Recently,  Whyte  et  al.  [40]  developed  a  novel  algorithm 


Figure  3:  DeblurGAN  generator  architecture.  DeblurGAN  contains  two  strided  convolution  blocks  with  stride  nine  resid¬ 
ual  blocks  [1  :  ]  and  two  transposed  convolution  blocks.  Each  ResBlock  consists  of  a  convolution  layer,  instance  normalization 
layer,  and  ReLU  activation. 


for  non-uniform  blind  deblurring  based  on  a  parametrized 
geometric  model  of  the  blurring  process  in  terms  of  the 
rotational  velocity  of  the  camera  during  exposure.  Simi¬ 
larly  Gupta  et  al.  [12]  made  an  assumption  that  the  blur  is 
caused  only  by  3D  camera  movement.  With  the  success 
of  deep  learning,  over  the  last  few  years,  there  appeared 
some  approaches  based  on  convolutional  neural  networks 
(CNNs).  Sun  et  al.  [36]  use  CNN  to  estimate  blur  ker¬ 
nel,  Chakrabarti  [6]  predicts  complex  Fourier  coefficients 
of  motion  kernel  to  perform  non-blind  deblurring  in  Fourier 
space  whereas  Gong  [9]  use  fully  convolutional  network  to 
move  for  motion  flow  estimation.  All  of  these  approaches 
use  CNN  to  estimate  the  unknown  blur  function.  Recently, 
a  kernel-free  end-to-end  approaches  by  Noorozi  [27]  and 
Nah  [25]  that  uses  multi- scale  CNN  to  directly  deblur  the 
image.  Ramakrishnan  et  al.  [29]  use  the  combination  of 
pix2pix  framework  [16]  and  densely  connected  convolu¬ 
tional  networks  [1  ]  to  perform  blind  kernel-free  image 
deblurring.  Such  methods  are  able  to  deal  with  different 
sources  of  the  blur. 

2.2.  Generative  adversarial  networks 

The  idea  of  generative  adversarial  networks,  introduced 
by  Goodfellow  et  al.  [10],  is  to  define  a  game  between  two 
competing  networks:  the  discriminator  and  the  generator. 
The  generator  receives  noise  as  an  input  and  generates  a 
sample.  A  discriminator  receives  a  real  and  generated  sam¬ 
ple  and  is  trying  to  distinguish  between  them.  The  goal  of 
the  generator  is  to  fool  the  discriminator  by  generating  per¬ 
ceptually  convincing  samples  that  can  not  be  distinguished 
from  the  real  one.  The  game  between  the  generator  G  and 
discriminator  D  is  the  minimax  objective: 

minmax  E  [log(D(x))]  +  E  [log(l  —  D{x))\  (2) 

G  D  x^Fr  x^Fg 

where  Pr  is  the  data  distribution  and  ¥g  is  the  model  dis¬ 
tribution,  defined  by  x  =  G(z),z  ^  P(z),  the  input  2 
is  a  sample  from  a  simple  noise  distribution.  GANs  are 
known  for  its  ability  to  generate  samples  of  good  percep¬ 
tual  quality,  however,  training  of  vanilla  version  suffer  from 


many  problems  such  as  mode  collapse,  vanishing  gradi¬ 
ents  etc,  as  described  in  [33].  Minimizing  the  value  func¬ 
tion  in  GAN  is  equal  to  minimizing  the  Jensen-Shannon  di¬ 
vergence  between  the  data  and  model  distributions  on  x. 
Arjovsky  et  al.  [2]  discuss  the  difficulties  in  GAN  train¬ 
ing  caused  by  JS  divergence  approximation  and  propose 
to  use  the  Earth-Mover  (also  called  Wasserstein-1)  distance 
W(q,p).  The  value  function  for  WGAN  is  constructed  us¬ 
ing  Kantorovich-Rubinstein  duality  [39]: 

minmax  E  \D(x) }  —  E  \D(x) }  (3) 

G  DeVx^Fr  x^Fg 

where  V  is  the  set  of  1— Lipschitz  functions  and  Fg  is  once 
again  the  model  distribution  The  idea  here  is  that  critic  value 
approximates  K  •  W(Pr ,  Pq),  where  K  is  a  Lipschitz  con¬ 
stant  and  W(Pr ,  Pq)  is  a  Wasserstein  distance.  In  this  set¬ 
ting,  a  discriminator  network  is  called  critic  and  it  approx¬ 
imates  the  distance  between  the  samples.  To  enforce  Lips¬ 
chitz  constraint  in  WGAN  Arjovsky  et  al.  add  weight  clip¬ 
ping  to  [— c,  c].  Gulrajani  et  al.  [11]  propose  to  add  a  gradi¬ 
ent  penalty  term  instead: 

A  E  [(||V£D(i)||2-l)2]  (4) 

x^Fx 

to  the  value  function  as  an  alternative  way  to  enforce  the 
Lipschitz  constraint.  This  approach  is  robust  to  the  choice 
of  generator  architecture  and  requires  almost  no  hyperpa¬ 
rameter  tuning.  This  is  crucial  for  image  deblurring  as  it  al¬ 
lows  to  use  novel  lightweight  neural  network  architectures 
in  contrast  to  standard  Deep  ResNet  architectures,  previ¬ 
ously  used  for  image  deblurring  [25]. 

2.3.  Conditional  adversarial  networks 

Generative  Adversarial  Networks  have  been  applied  to 
different  image-to-image  translation  problems,  such  as  su¬ 
per  resolution  [20],  style  transfer  [22],  product  photo  gen¬ 
eration  [  ]  and  others.  Isola  et  al.  [16]  provides  a  detailed 
overview  of  those  approaches  and  present  conditional  GAN 
architecture  also  known  as  pix2pix.  Unlike  vanilla  GAN, 


cGAN  leams  a  mapping  from  observed  image  x  and  ran¬ 
dom  noise  vector  z,  to  y  :  G  :  x,z  y.  Isola  et  al. 
also  put  a  condition  on  the  discriminator  and  use  U-net 
architecture  [31]  for  generator  and  Markovian  discrimina¬ 
tor  which  allows  achieving  perceptually  superior  results  on 
many  tasks,  including  synthesizing  photos  from  label  maps, 
reconstructing  objects  from  edge  maps,  and  colorizing  im¬ 
ages. 

3.  The  proposed  method 

The  goal  is  to  recover  sharp  image  Is  given  only  a 
blurred  image  Ib  as  an  input,  so  no  information  about  the 
blur  kernel  is  provided.  Debluring  is  done  by  the  trained 
CNN  Gqg,  to  which  we  refer  as  the  Generator.  For  each 
Ib  it  estimates  corresponding  Is  image.  In  addition,  during 
the  training  phase,  we  introduce  critic  the  network  Dqd  and 
train  both  networks  in  an  adversarial  manner. 

3.1.  Loss  function 

We  formulate  the  loss  function  as  a  combination  of  con¬ 
tent  and  adversarial  loss: 

£  =  +  Aj  Cx^  (5) 

adv  loss  content  loss 

- -  ^ 

total  loss 

where  the  A  equals  to  100  in  all  experiments.  Unlike  Isola  et 
al.  [16]  we  do  not  condition  the  discriminator  as  we  do  not 
need  to  penalize  mismatch  between  the  input  and  output. 
Adversarial  loss  Most  of  the  papers  related  to  conditional 
GANs,  use  vanilla  GAN  objective  as  the  loss  [20]  [25]  func¬ 
tion.  Recently  [U  ]  provides  an  alternative  way  of  using 
least  aquare  GAN  [23]  which  is  more  stable  and  generates 
higher  quality  results.  We  use  WGAN-GP  [11]  as  the  critic 
function,  which  is  shown  to  be  robust  to  the  choice  of  gen¬ 
erator  architecture  [1  ] .  Our  premilinary  experiments  with 
different  architectures  confirmed  that  findings  and  we  are 
able  to  use  architecture  much  lighter  than  ResNetl52  [25], 
see  next  subsection.  The  loss  is  calculated  as  the  following: 

N 

CGAN  =  J2-D^(Gea(IB))  (6) 

n=  1 

DeblurGAN  trained  without  GAN  component  converges, 
but  produces  smooth  and  blurry  images. 

Content  loss.  Two  classical  choices  for  ’’content”  loss 
function  are  LI  or  MAE  loss,  L2  or  MSE  loss  on  raw  pix¬ 
els.  Using  those  functions  as  sole  optimization  target  leads 
to  the  blurry  artifacts  on  generated  images  due  to  the  pixel- 
wise  average  of  possible  solutions  in  the  pixel  space  [2C]. 
Instead,  we  adopted  recently  proposed  Perceptual  loss  [17]. 
Perceptual  loss  is  a  simple  L2-loss,  but  based  on  the  differ¬ 
ence  of  the  generated  and  target  image  CNN  feature  maps. 
It  is  defined  as  following: 


■a 

CD 

o 

-i— < 

CO 

CD 

cr 


o 

c3 

CD 

c 

CD 

O 


"O 

CD 

L_ 

DQ 


Wasserstein  distance 


Figure  4:  DeblurGAN  training.  The  generator  network 
takes  the  blurred  image  as  input  and  produces  the  estimate 
of  the  sharp  image.  The  critic  network  takes  the  restored 
and  sharp  images  and  outputs  a  distance  between  them.  The 
total  loss  consists  of  the  WGAN  loss  from  critic  and  the  per¬ 
ceptual  loss  [17].  The  perceptual  loss  is  the  difference  be¬ 
tween  the  VGG-19  [34]  conv3.3  feature  maps  of  the  sharp 
and  restored  images.  At  test  time,  only  the  generator  is  kept. 


Witj  Hitj 

^ x  =  W~H~  )x,y—4>i,j(GoG(I  ))x,y) 

x=l  y=l 

where  fyj  is  the  feature  map  obtained  by  the  j-th  convo¬ 
lution  (after  activation)  before  the  i-th  maxpooling  layer 
within  the  VGG19  network,  pretrained  on  ImageNet  [7], 
Wij  and  H,  j  are  the  dimensions  of  the  feature  maps.  In 
our  work  we  use  activations  from  VGGs^  convolutional 
layer.  The  activations  of  the  deeper  layers  represents  the 
features  of  a  higher  abstraction  [46]  [20].  The  perceptual 
loss  focuses  on  restoring  general  content  [16]  [20]  while  ad- 


Figure  5:  Examples  of  generated  camera  motion  trajectory  and  the  blur  kernel  and  the  corresponding  blurred  images. 


versarial  loss  focuses  on  restoring  texture  details.  Deblur- 
GAN  trained  without  Perceptual  loss  or  with  simple  MSE 
on  pixels  instead  doesn’t  converge  to  meaningful  state. 

Additional  regularization.  We  have  also  tried  to  add 
TV  regularization  and  model  trained  with  it  yields  worse 
performance  -  27.9  vs.  28.7  w/o  PSNR  on  GoPro  dataset. 

3.2.  Network  architecture 

Generator  CNN  architecture  is  shown  in  Figure  3.  It 
is  similar  to  one  proposed  by  Johnson  et  al.  [17]  for  style 
transfer  task.  It  contains  two  strided  convolution  blocks 
with  stride  nine  residual  blocks  [13]  (ResB locks)  and 
two  transposed  convolution  blocks.  Each  ResB  lock  consists 
of  a  convolution  layer,  instance  normalization  layer  [38], 
and  ReLU  [26]  activation.  Dropout  [35]  regularization 
with  a  probability  of  0.5  is  added  after  the  first  convolu¬ 
tion  layer  in  each  ResB  lock.  In  addition,  we  introduce  the 
global  skip  connection  which  we  refer  to  as  ResOut.  CNN 
learns  a  residual  correction  to  the  blurred  image  /#,  so 
Is  =  Ib  +  Ir.  We  find  that  such  formulation  makes  train¬ 
ing  faster  and  resulting  model  generalizes  better.  During 
the  training  phase,  we  define  a  critic  network  Dqd  ,  which  is 
Wasserstein  GAN  [2]  with  gradient  penalty  [11],  to  which 
we  refer  as  WGAN-GP.  The  architecture  of  critic  network 
is  identical  to  PatchGAN  [16,  22].  All  the  convolutional 
layers  except  the  last  are  followed  by  InstanceNorm  layer 
and  Leaky  ReLU  [A  ]  with  a  =  0.2. 

4.  Motion  blur  generation 

There  is  no  easy  method  to  obtain  image  pairs  of  cor¬ 
responding  sharp  and  blurred  images  for  training. A  typical 
approach  to  obtain  image  pairs  for  training  is  to  use  a  high 
frame-rate  camera  to  simulate  blur  using  average  of  sharp 
frames  from  video  [27,  25].  It  allows  to  create  realistic 
blurred  images  but  limits  the  image  space  only  to  scenes 
present  in  taken  videos  and  makes  it  complicated  to  scale 
the  dataset.  Sun  et  al.  [3(  ]  creates  synthetically  blurred  im¬ 
ages  by  convolving  clean  natural  images  with  one  out  of  73 
possible  linear  motion  kernels,  Xu  et  al.  [43]  also  use  lin¬ 
ear  motion  kernels  to  create  synthetically  blurred  images. 


Figure  6:  Top  row:  Blur  kernels  from  real-world  images 
estimated  by  Fergus  et  al.  [^  ].  Bottom  row:  Synthetically 
generated  kernels  by  our  method.  Our  randomized  method 
can  simulate  wide  variety  of  realistic  blur  kernels  with  dif¬ 
ferent  level  of  non-linearity. 

Chakrabarti  [6]  creates  blur  kernel  by  sampling  6  random 
points  and  fitting  a  spline  to  them.  We  take  a  step  further  and 
propose  a  method,  which  simulates  more  realistic  and  com¬ 
plex  blur  kernels.  We  follow  the  idea  described  by  Borac- 
chi  and  Foi  [4]  of  random  trajectories  generation.  Then  the 
kernels  are  generated  by  applying  sub-pixel  interpolation  to 
the  trajectory  vector.  Each  trajectory  vector  is  a  complex 
valued  vector,  which  corresponds  to  the  discrete  positions 
of  an  object  following  2D  random  motion  in  a  continuous 
domain.  Trajectory  generation  is  done  by  Markov  process, 
summarized  in  Algorithm  1 .  Position  of  the  next  point  of 
the  trajectory  is  randomly  generated  based  on  the  previous 
point  velocity  and  position,  gaussian  perturbation,  impulse 
perturbation  and  deterministic  inertial  component. 

5.  Training  Details 

We  implemented  all  of  our  models  using  PyTorch[l] 
deep  learning  framework.  The  training  was  performed 
on  a  single  Maxwell  GTX  Titan-X  GPU  using  three 


Figure  7:  Results  on  the  GoPro  test  dataset.  From  left  to  right:  blurred  photo,  Nah  et  al.  [25],  DeblurGAN. 


Figure  8:  Results  on  the  Kohler  dataset.  From  left  to  right:  blurred  photo,  Nah  et  al.  [25],  DeblurGAN. 


different  datasets.  The  first  model  to  which  we  re¬ 
fer  as  DeblurGANWILD  was  trained  on  a  random  crops 
of  size  256x256  from  1000  GoPro  training  dataset  im¬ 


ages  [25]  downscaled  by  a  factor  of  two.  The  second  one 
DeblurGAN  Synth  was  trained  on  256x256  patches  from  MS 
COCO  dataset  blurred  by  method,  presented  in  previous 


Algorithm  1  Motion  blur  kernel  generation. 

Parameters: 

M  =  2000  -  number  of  iterations, 

Lmax  =  60  -  max  length  of  the  movement, 
ps  =  0.001  -  probability  of  impulsive  shake, 

/  -  inertia  term,  uniform  from  (0,0.7), 

Pb  -  probability  of  big  shake,  uniform  from  (0,0.2), 
pg  -  probability  of  gaussian  shake,  uniform  from  (0,0.7), 

<j)  -  initial  angle,  uniform  from  (0,27 r), 
x  -  trajectory  vector. 

l:  procedure  BLUR(Img,  M,  Lmax,ps) 

2:  vq  <—  cos{(j))  +  sm(</>)  *  i 

3:  V  v0  *  Lmax/{M  -  1) 

4:  x  =  zeros  (M,  1) 

5:  for  t  =  1  to  M  —  1  do 

6:  if  randn  <  Pb  *  ps  then 

7:  nextDir  <-  2  •  v  •  ei*(7r+(randn-°-5))) 

8:  else: 

9:  nextDir  0 

10:  dv  <—  nextDir  +  *  (pg  *  (randn  +  i  *  randn)  * 

/  *  x[t]  *  ( Lrnax / (ilT  —  1)) 

11:  V  ^  v  +  dv 

12:  U  <-  ( v/abs(v ))  *  Lrnax /(M  ~  1) 

13:  +  1]  x[t]  + 

14:  Kernel  sub  pixel  interpolation (x) 

15:  Blurred  image  <—  conv (Kernel,  Img) 

16:  return  Blurred  image 


Section.  We  also  trained  DeblurGAN comb  on  a  combina¬ 
tion  of  synthetically  blurred  images  and  images  taken  in  the 
wild,  where  the  ratio  of  synthetically  generated  images  to 
the  images  taken  by  a  high  frame-rate  camera  is  2:1.  As 
the  models  are  fully  convolutional  and  are  trained  on  image 
patches  they  can  be  applied  to  images  of  arbitrary  size.  For 
optimization  we  follow  the  approach  of  b  ]  and  perform 
5  gradient  descent  steps  on  Dqd ,  then  one  step  on  Gqg , 
using  Adam  [18]  as  a  solver.  The  learning  rate  is  set  ini¬ 
tially  to  10-4  for  both  generator  and  critic.  After  the  first 
150  epochs  we  linearly  decay  the  rate  to  zero  over  the  next 
150  epochs.  At  inference  time  we  follow  the  idea  of  [16] 
and  apply  both  dropout  and  instance  normalization.  All  the 
models  were  trained  with  a  batch  size  =  1,  which  showed 
empirically  better  results  on  validation.  The  training  phase 
took  6  days  for  training  one  DeblurGAN  network. 

6.  Experimental  evaluation 

6.1.  GoPro  Dataset 

GoPro  dataset[25]  consists  of  2103  pairs  of  blurred  and 
sharp  images  in  720p  quality,  taken  from  various  scenes. 
We  compare  the  results  of  our  models  with  state  of  the  art 
models  [36],  [25]  on  standard  metrics  and  also  show  the 


Table  1 :  Peak  signal-to-noise  ratio  and  the  structural  sim¬ 
ilarity  measure,  mean  over  the  GoPro  test  dataset  of  1111 
images.  All  models  were  tested  on  the  linear  image  subset. 
State-of-art  results  (*)  by  Nah  et  al.  [25]  obtained  on  the 
gamma  subset. 


Sun  et  al. 

Nah  et  al. 

Xu  et  al. 

DeblurGAN 

Metric 

[36] 

[25] 

[44] 

WILD 

Synth 

Comb 

PSNR 

24.6 

28.3/29.1* 

25.1 

27.2 

23.6 

28.7 

SSIM 

0.842 

0.916 

0.89 

0.954 

0.884 

0.958 

Time 

20  min 

4.33  s 

13.41  s 

0.85  s 

running  time  of  each  algorithm  on  a  single  GPU.  Results 
are  in  Tablet.  DeblurGAN  shows  superior  results  in  terms 
of  structured  self- similarity,  is  close  to  state-of-the-art  in 
peak  signal-to-noise-ratio  and  provides  better  looking  re¬ 
sults  by  visual  inspection.  In  contrast  to  other  neural  mod¬ 
els,  our  network  does  not  use  L2  distance  in  pixel  space  so 
it  is  not  directly  optimized  for  PSNR  metric.  It  can  handle 
blur  caused  by  camera  shake  and  object  movement,  does 
not  suffer  from  usual  artifacts  in  kernel  estimation  methods 
and  at  the  same  time  has  more  than  6x  fewer  parameters 
comparing  to  Multi-scale  CNN  ,  which  heavily  speeds  up 
the  inference.  Deblured  images  from  test  on  GoPro  dataset 
are  shown  in  Figure  7. 

6.2.  Kohler  dataset 

Kohler  dataset  [19]  consists  of  4  images  blurred  with  12 
different  kernels  for  each  of  them.  This  is  a  standard  bench¬ 
mark  dataset  for  evaluation  of  blind  deblurring  algorithms. 
The  dataset  is  generated  by  recording  and  analyzing  real 
camera  motion,  which  is  played  back  on  a  robot  platform 
such  that  a  sequence  of  sharp  images  is  recorded  sampling 
the  6D  camera  motion  trajectory.  Results  are  in  Table  2, 
similar  to  GoPro  evaluation. 

6.3.  Object  Detection  benchmark  on  YOLO 

Object  Detection  is  one  of  the  most  well-studied  prob¬ 
lems  in  computer  vision  with  applications  in  different  do¬ 
mains  from  autonomous  driving  to  security.  During  the  last 
few  years  approaches  based  on  Deep  Convolutional  Neural 
Networks  showed  state  of  the  art  performance  comparing  to 
traditional  methods.  However,  those  networks  are  trained 
on  limited  datasets  and  in  real-world  settings  images  are  of¬ 
ten  degraded  by  different  artifacts,  including  motion  blur, 
Similar  to  [21]  and  [32]  we  studied  the  influence  of  motion 
blur  on  object  detection  and  propose  a  new  way  to  evalu¬ 
ate  the  quality  of  deblurring  algorithm  based  on  results  of 
object  detection  on  a  pretrained  YOLO  [30]  network. 

For  this,  we  constructed  a  dataset  of  sharp  and  blurred 
street  views  by  simulating  camera  shake  using  a  high  frame- 


Table  2:  Peak  signal-to-noise  ratio  and  structural  similarity  measure,  mean  on  the  Kohler  dataset.  Xu  et  al.  [44]  and  Whyte  et 
al.  [40]  are  non-CNN  blind  deblurring  methods,  whereas  Sun  et  al.  [36]  and  Nah  et  al.  [25]  use  CNN. 


Method 

Sun  et  al. 

Nah  et  al. 

Xu  et  al. 

Whyte  et  al. 

DeblurGAN 

Metric 

[  6] 

[  5] 

[  4] 

[40] 

WILD 

Synth 

Comb 

PSNR 

25.22 

26.48 

27.47 

27.03 

26.10 

25.67 

25.86 

SSIM 

0.773 

0.807 

0.811 

0.809 

0.816 

0.792 

0.802 

(a)  Blurred  photo  (b)  Nah  et  al.  [  15]  (c)  DeblurGAN  (d)  Sharp  photo 

Figure  9:  YOLO  object  detection  before  and  after  deblurring 


rate  video  camera.  Following  [14]  [25]  [27]  we  take  a  ran¬ 
dom  between  5  and  25  frames  taken  by  240fps  camera  and 
compute  the  blurred  version  of  a  middle  frame  as  an  aver¬ 
age  of  those  frames.  All  the  frames  are  gamma-corrected 
with  7  =  2.2  and  then  the  inverse  function  is  taken  to  ob¬ 
tain  the  final  blurred  frame.  Overall,  the  dataset  consists  of 
410  pairs  of  blurred  and  sharp  images,  taken  from  the  streets 
and  parking  places  with  different  number  and  types  of  cars. 

Blur  source  includes  both  camera  shake  and  blur  caused 
by  car  movement.  The  dataset  and  supplementary  code  are 
available  online.  Then  sharp  images  are  feed  into  the  YOLO 
network  and  the  result  after  visual  verification  is  assigned  as 
ground  truth.  Then  YOLO  is  run  on  blurred  and  recovered 
versions  of  images  and  average  recall  and  precision  between 
obtained  results  and  ground  truth  are  calculated.  This  ap¬ 
proach  corresponds  to  the  quality  of  deblurring  models  on 
real-life  problems  and  correlates  with  the  visual  quality  and 
sharpness  of  the  generated  images,  in  contrast  to  standard 
PSNR  metric.  The  precision,  in  general,  is  higher  on  blurry 
images  as  there  are  no  sharp  object  boundaries  and  smaller 
object  are  not  detected  as  it  shown  in  Figure  9. 

Results  are  shown  in  Table  3.  DeblurGAN  significantly 
outperforms  competitors  in  terms  of  recall  and  FI  score. 

7.  Conclusion 

We  described  a  kernel-free  blind  motion  deblurring 
learning  approach  and  introduced  DeblurGAN  which  is  a 
Conditional  Adversarial  Network  that  is  optimized  using  a 
multi-component  loss  function.  In  addition  to  this,  we  im- 


Table  3:  Results  of  YOLO  [30]  object  detection  on  blurred 
and  restored  photos  using  DeblurGAN  and  Nah  et  al.  [2:  ] 
algorithms.  Results  on  corresponding  sharp  images  are  con¬ 
sidered  ground  truth.  DeblurGAN  has  higher  recall  and  FI 
score  than  its  competitors. 


Method 

prec. 

recall 

FI  score 

no  deblur 

0.821 

0.437 

0.570 

Nah  et  al.  [25] 

0.834 

0.552 

0.665 

DeblurGAN  WILD 

0.764 

0.631 

0.691 

DeblurGAN  synth 

0.801 

0.517 

0.628 

DeblurGAN  comb 

0.671 

0.742 

0.704 

plemented  a  new  method  for  creating  a  realistic  synthetic 
motion  blur  able  to  model  different  blur  sources.  We  in¬ 
troduce  a  new  benchmark  and  evaluation  protocol  based  on 
results  of  object  detection  and  show  that  DeblurGAN  sig¬ 
nificantly  helps  detection  on  blurred  images. 
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